Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Transcriber: Development and use of a tool for assisting speech corpora production

Identifieur interne : 000286 ( Main/Exploration ); précédent : 000285; suivant : 000287

Transcriber: Development and use of a tool for assisting speech corpora production

Auteurs : Claude Barras [France] ; Edouard Geoffrois [France] ; Zhibiao Wu [États-Unis] ; Mark Liberman [États-Unis]

Source :

RBID : ISTEX:451C86626179A17CCC13ACCCE699A1A2EBDFA996

Abstract

We present “Transcriber”, a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and tcLex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries. As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier.

Url:
DOI: 10.1016/S0167-6393(00)00067-4


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Transcriber: Development and use of a tool for assisting speech corpora production</title>
<author>
<name sortKey="Barras, Claude" sort="Barras, Claude" uniqKey="Barras C" first="Claude" last="Barras">Claude Barras</name>
</author>
<author>
<name sortKey="Geoffrois, Edouard" sort="Geoffrois, Edouard" uniqKey="Geoffrois E" first="Edouard" last="Geoffrois">Edouard Geoffrois</name>
</author>
<author>
<name sortKey="Wu, Zhibiao" sort="Wu, Zhibiao" uniqKey="Wu Z" first="Zhibiao" last="Wu">Zhibiao Wu</name>
</author>
<author>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:451C86626179A17CCC13ACCCE699A1A2EBDFA996</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0167-6393(00)00067-4</idno>
<idno type="url">https://api.istex.fr/document/451C86626179A17CCC13ACCCE699A1A2EBDFA996/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000477</idno>
<idno type="wicri:Area/Istex/Curation">000477</idno>
<idno type="wicri:Area/Istex/Checkpoint">000236</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000236</idno>
<idno type="wicri:doubleKey">0167-6393:2001:Barras C:transcriber:development:and</idno>
<idno type="wicri:Area/Main/Merge">000310</idno>
<idno type="wicri:Area/Main/Curation">000286</idno>
<idno type="wicri:Area/Main/Exploration">000286</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Transcriber: Development and use of a tool for assisting speech corpora production</title>
<author>
<name sortKey="Barras, Claude" sort="Barras, Claude" uniqKey="Barras C" first="Claude" last="Barras">Claude Barras</name>
<affiliation wicri:level="3">
<country xml:lang="fr">France</country>
<wicri:regionArea>Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Orsay</settlement>
</placeName>
</affiliation>
<affiliation></affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Geoffrois, Edouard" sort="Geoffrois, Edouard" uniqKey="Geoffrois E" first="Edouard" last="Geoffrois">Edouard Geoffrois</name>
<affiliation wicri:level="3">
<country xml:lang="fr">France</country>
<wicri:regionArea>DGA/CTA/GIP, 16 bis av. Prieur de la Côte d'Or, 94114 Arcueil Cedex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Arcueil</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Wu, Zhibiao" sort="Wu, Zhibiao" uniqKey="Wu Z" first="Zhibiao" last="Wu">Zhibiao Wu</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>LDC, 3615 Market Street, Suite 200, Philadelphia, PA 19104-2608</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>LDC, 3615 Market Street, Suite 200, Philadelphia, PA 19104-2608</wicri:regionArea>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Speech Communication</title>
<title level="j" type="abbrev">SPECOM</title>
<idno type="ISSN">0167-6393</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">33</biblScope>
<biblScope unit="issue">1–2</biblScope>
<biblScope unit="page" from="5">5</biblScope>
<biblScope unit="page" to="22">22</biblScope>
</imprint>
<idno type="ISSN">0167-6393</idno>
</series>
<idno type="istex">451C86626179A17CCC13ACCCE699A1A2EBDFA996</idno>
<idno type="DOI">10.1016/S0167-6393(00)00067-4</idno>
<idno type="PII">S0167-6393(00)00067-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0167-6393</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We present “Transcriber”, a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and tcLex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries. As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
<li>Île-de-France</li>
</region>
<settlement>
<li>Arcueil</li>
<li>Orsay</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Île-de-France">
<name sortKey="Barras, Claude" sort="Barras, Claude" uniqKey="Barras C" first="Claude" last="Barras">Claude Barras</name>
</region>
<name sortKey="Barras, Claude" sort="Barras, Claude" uniqKey="Barras C" first="Claude" last="Barras">Claude Barras</name>
<name sortKey="Geoffrois, Edouard" sort="Geoffrois, Edouard" uniqKey="Geoffrois E" first="Edouard" last="Geoffrois">Edouard Geoffrois</name>
</country>
<country name="États-Unis">
<region name="Pennsylvanie">
<name sortKey="Wu, Zhibiao" sort="Wu, Zhibiao" uniqKey="Wu Z" first="Zhibiao" last="Wu">Zhibiao Wu</name>
</region>
<name sortKey="Liberman, Mark" sort="Liberman, Mark" uniqKey="Liberman M" first="Mark" last="Liberman">Mark Liberman</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000286 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000286 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:451C86626179A17CCC13ACCCE699A1A2EBDFA996
   |texte=   Transcriber: Development and use of a tool for assisting speech corpora production
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024